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A Study in Educational Prognosis 



CHAPTER I 



THE PROBLEM 



Can academic success be predicted! If so, how! Three 
bases of prognosis have received much consideration: College 
entrance examinations, teachers' estimates, and school marks. 

In 1906, Thomdike ^ pointed out that there was a low correla- 
tion between the marks of pupils in college entrance examination 
and their marks later in college. Adam Leroy Jones, as cham- 
pion of the college entrance examination plan, maintained that 
''No advocate of examinations ever supposed that the purpose of 
examinations was to furnish a prediction of what the boy would 
do . . . through his college course, or indeed even through the 
first year of the course."* It is certain that teachers' esti- 
mates are not perfect in selecting the pupils who can pass the 
examinations of the College Entrance Examination Board. For 
example, in 1916, three fourths of the students specially recom- 
mended by their teachers as able to pass the examinations in 
American history^ in mediaeval and modem history, and in civil 
government, failed to make a grade of sixty per cent.' Can 
twenty-five per cent efficiency in estimating academic success 
be considered successful? 

Do school marks foretell academic success better than do teach- 
ers ' estimates? In spite of the different standards of marking of 
different schools, of different departments, and by different 
teachers, of the different emphasis placed upon different parts of 
the same work, of the inability of some teachers to see small dif- 
ferences — ^in spite of all these differences, are school marks a 

1 ''Future of the CoUege Entrance Examination Board," Educational 
Review, 31:5. 

2 "Entrance Examination and College Records," Educational Review, 
48:109, 1014. 

8 Bicrteenth Annual Report of College Entrcmce Eaoamination Board, 1916. 
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m<^/ic6ura{e •bkfli8.:for'*pF6gmsis HKah** leaeheH^' estimates f 
Without discussing whether Dearborn's coefficient of average or 
average of coefficients is the more suitable for his work, the con- 
clusion that he reaches may be noted : In seventy-five per cent of 
the cases the standing in the university can be predicted from the 
standing in the high school.^ F. 0. Smith found that with 120 
students at the University of Iowa there was a correlation of .53 
between the average of all high school marks and all marks in the 
university.*^ Walter W. Pettit found a correlation of .63 between 
the average of all high school marks and the freshman marks in 
college.* In the cases of 253 Harvard students, E. A. Lincoln 
found that the correlation between high school standing and 
standing in the college entrance examination was .46, the corre- 
lation between college entrance examination and standing the 
freshman year in college, .47, while the correlation between 
high school standing and freshman college standing was .69. 
Therefore, he concludes that school marks furnish a better basis 
for prognosis than entrance examinations.^ 

Can school marks be considered accurate when the marks of 
142 English teachers, as Starch and Elliot have pointed out, 
vary in grading the same composition from 50 to 98,* and the 
marks of 118 mathematics teachers for the same paper in math- 
ematics vary from 28 to 90! • Some recognition of this wide 
differing is necessary in order to appreciate the extraordinary 
variability in teachers' marks pointed out by F. J. Kelly.^® 

Unreliable as school marks may be, Truman Lee Eelley found 
that, for estimating the pupil's scholastic ability, the elementary 
school records of the pupil gave more accurate information than 
either the teachers' estimates or the tests he devised.^^ 

^BuUetin No. 312, High School Series No. 6, University of Wisconsin, 
1909. 

^A Rational Basis for Determining Fitness for College Entrance, Uni- 
versity of Iowa Studies, Vol. 1, No. 3, 1910. 

« A Comparative Study of New York High School and Columbia College 
Grades, Master's Essay, Teachers College, 1912. 

r School and Society, Vol. V, No. 119, p. 417, 1917. 

• School Review, 20:442-457. 

»/bui., 21:254-259. 

10 Teachers' Marks, Teachers College, Contributions to Education, 1914. 

11 Educational ChUdamce, Teachers Collie, Contributions to Education, 
1914. 
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The Problem 



When exaMndtit)ni3/ ieicche'lri'* esliiQates,** tiiid *sdid(Sl •marks 
have been considered, is there any other basis for predicting 
academic success! Standardized educational and psychological 
tests form the basis in this study for predicting a pupil's suc- 
cess. On the basis of standardized tests, to what extent can a 
pupil's academic success be foretold? An answer to this ques- 
tion constitutes the theme of the work that follows — ^A Study in 
Educational Prognosis. 
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CHAPTER II 



THE EXPERIMENT 



1. Conditions Under Which the Experiment Was Made 



This study in educational prognosis concerns itself chiefly 
with an experiment in organizing into homogeneous groups the 
pupils who entered a junior high school. The experiihent has 
been made possible by Teachers College, Columbia University, 
and the public school system of New York City cooperating in 
the organization of the Speyer School as an experimental aca- 
demic junior high school for boys. This school, as a part of the 
free public school system of New York City, opened February, 

1916, with about two hundred boys who had finished the first 
six grades of the regular schools. One hundred additional boys 
entered in September, 1916, and fifty more entered in February, 

1917. It is the first group — ^the one entering in February, 1916 
— ^that forms the basis of this study. 

When the two hundred pupils entered the school, an attempt 
was made to organize them for purposes of instruction into 
homogeneous groups. On account of the size of the class rooms, 
the groups were limited to twenty-five pupils each. All groups, 
when so organized, were to follow the same course of study, but 
each group was to proceed as rapidly as it was able, i.e., at its 
optimum speed. This means that the abler classes, with the 
revised and enriched course of study, with the improved method 
of instruction and of study, have the opportunity of completing 
the three years' work of the junior high school as rapidly as they 
are able — ^possibly in two years. If such is the case, the pupils 
so doing will pass from the completion of the 6B grade to the 
second year of the senior high school in two years. The virtue 
of the plan lies partly in the fact that the brighter pupils are 
not held back by the slower ones, and that these slower pupils 
are not discouraged by being rushed beyond their best rate of 
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work or by being placed* fix 'gtou^Ts Wh!r*pupils ^di*whbl& thej: 
cannot compete. 

One of the first problems in organizing the school with this 
limited number of pupils was to find to what extent the pupils 
were typical 6B boys. Before the opening of the school in 
February, 1916, it was found that pupils would be sent from five 
public schools — ^Numbers 5, lOB, 43, 184, and 186 Manhattan. 
Accordingly, each teacher of each 6B class in these five schools 
was asked to rank his or her pupils separately in intelligence 
and in industry. In addition there was given to all of these 
6B boys, the Woody Multiplication Scale, the Trabue Comple- 
tion-Test Language Scales B and C, and fifty words from the 
Ayres Spelling Scale, list Q. Each pupil also wrote an English 
comi)osition on the subject, ''How I Should Spend Twenty 
Dollars. ' ' Thus two standards were provided by means of which 
it was possible to compare those pupils who came to Speyer 
School with all the other boys of the twenty-four classes from 
which they came. This was evidently necessary in order that 
one might know to what extent the pupils included in this study 
were typical 6B boys. 

While the working out of this preliminary problem was neces- 
sary, the real problem was to classify into homogeneous 
groups, on the basis of mental ability, all the pupils present at 
the opening of the school. To do this the scores were retained 
that these pupils had made in the five tests — ^Woody Multiplica- 
tion, Ayres Spelling, Composition, Trabue Completion-Test 
Language Scales B and C — ^and six additional tests were then 
given : Thomdike Beading Alpha 2, Part II, Thomdike Visual 
Vocabulary A, and Woodworth and Wells Easy Opposites, Easy 
Directions and Mixed Relations. Each of these tests was chosen 
because it had shown in previous experiments a positive correla- 
tion with desirable traits. When the achievement of all pupils 
in each test had been ranked and each pupil's ranks in all tests 
had been added, it was possible by ranking these totals to state 
in a single figure where, on the basis of achievement in the eleven 
tests, each pupil stood in relation to each of the others of the 
whole group. The pupil with the highest score on the basis of 
the tests was ranked one, the second best, two, and so on. Those 
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taimilgVf]?(4n*©H&«fo tweaty-five T^elre* |)lacea m the first class, 
Al; pupils twenty-six to fifty were placed in A2; pupils fifty- 
one to seventy-five in A3, and so on to pupils one hundred 
seventy-six to two hundred in A8. A pupil was not, however, 
fixed finally according to his original grouping. Whenever the 
teachers of any pupil agreed that he was in too slow or too fast 
a group, he was transferred. The fact that there were several 
groups made it possible for the teachers to make these transfers 
and still keep the classes about the original size. 

The criterion of prognosis in this experiment must rest finally 
in the teachers' judgments. By keeping a record of all trans- 
fers from one group to another, and by having the teachers rank 
the pupils after teaching them one year, it is possible to see 
how nearly the classification by the tests in the beginning cor- 
responds to that made by the teachers after teaching the pupils 
one year. 

In addition to the amount of statistical work involved, there 
were some limitations on including in this study all of the two 
hundred pupils of the first group that entered Speyer School. 
Due to absence from school when the first five tests were given 
in the twenty-four class rooms of the five public schools, some 
pupils missed one or more of the tests. There were in all ninety- 
seven pupils who had scores in every one of the eleven tests. 
Of these ninety-seven, seventy-four were still in school at the 
end of one year. Fortunately for this study, these seventy-four 
pupils were scattered through all of the groups from the fastest 
to the slowest. As a result of the departmental plan of teaching 
there were, aside from the teachers of drawing, music, shop, 
gymnasium, and general science, four teachers of regular aca- 
demic subjects who were teaching all of these seventy-four boys. 
These teachers, at the end of one year, ranked these pupils for 
general mental ability. When all cases have been considered, it 
will be seen that marks made in class correspond very closely to 
the estimate given by the teachers, yet each teacher was asked to 
rank the pupils on his or her own definition of general mental 
ability, and to make the ranking without consulting anyone. 
This was done. 

The data for this study therefore consist of the scores of all 
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the 6B boys in twenty-four classes in five New York City public 
schools, in five standardized tests; the ranking by each teacher 
of these twenty-four classes, of the boys of his or her class for 
intelligence and for industry; the scores of about one hundred 
and seventy-five pupils drawn from these twenty-four classes 
of 6B boys, in eleven educational and psychological tests; the 
record of all transfers from the grouping according to these 
tests made by the Speyer teachers during the first year of their 
teaching these pupils ; all school marks of seventy-four pupils of 
the first six grades of the public schools; all school marks of the 
same group during their first ye^ in the junior high school; the 
age of the seventy-four pupils; the ranking of seventy-four 
pupils for whom there were scores in eleven tests and who re- 
mained in school one year, by four teachers at the end of that 
year. In addition, the eleven tests were repeated at the end of 
one year, i.e., the same or similar tests were given to the seventy- 
four boys. 

2. The Tests 

Achievement in standardized educational and psychological 
tests was the basis for organizing the pupils into groups for 
purposes of instruction. The size of the class rooms limited 
these groups to twenty-five pupils each. 

Eleven tests were given in February, 1916, and a like number 
of the same or of similar tests were given to the same pupils one 
year later. The tests in 1916 were : 

1. Thomdike Reading Scale A, Visual Vocabulary.i 

2. Thorndike Scale Alpha 2, For Measuring the Understanding of Sen- 
tences, Part II.2 

3. An English composition on the subject, ''How I Would Spend Twenty 
Dollars." « 

4. Fifty words from the "Q" list of the Ayres Measuring Scale for 
AbiUty in Spelling.s 

5-6. Trabue Completion-Test Language Scales B and CA 

1 Thorndike. "Reading Scale A» Visual Vocabulary/' in Teaehera CoUege Beewd, 
September, 1914. 

2 "Scale Alpha 2. For Measuring the Understanding of Sentences/' in 
Teaehera OoUege Beeordf Vol. XVI, No. 6, November, 1915. 

8 Ayres, L. P., MeaauHng Scale for ±hUity in Spelling — Russell Sage Foundation, 
Diyision of Education. 

4 Trabue, M. R., OompletUm-Teat Langvage SedUa, Teachers College Oontributioni 
to Education, No. 77. 
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7-8. Woody Arithmetic Scales, Multiplication and Division, Series A.s 

9-10. Woodworth- Wells Logical Relations Tests — Opposites II (nortlv 
south, out), Mixed Relations II (good, bad, long).^ 

11. Woodworth- Wells Easy Directions (Cross out the smallest dot).o 

In 1917 the tests used were : 

1. Thomdike Reading Scale A2: Visual Vocabulary plus steps 11, 11^, 
12, 121^ of Scale A2: Provisional Extension.? 

2. Thomdike Reading Scale Alpha 2, Part II, repeated.^ 

3. An English composition on the subject, ''How I Should Like to Spend 
Next Saturday." s 

4. Ayres Spelling Scale— fifty words selected from the R, S, T, U, V 
and W lists.8 

6-6. Trabue Completion-Test Language Scales J and K.* 

7-8. Woody Arithmetic Scales, Multiplication and Division, Series B.b 

9-10. Woodworth- Wells Logical Relations: Opposites I (long, soft, 
white), and Mixed Relations I (eye, see, ear).o 

11. Woodworth-Wells Easy Directions (Cross out g in tiger ).• 

It will be noted that eight of these 1917 tests are not repeti- 
tions, but are similar to those given in 1916. Likewise it will be 
noted that Beading Alpha 2, Part II, is a repetition of the same 
test, and that the Woody tests. Series B, are also a repetition of 
the 1916 tests, but consist of only about half as many problems. 
While the footnotes accompanying the enumeration of the tests 
indicate where the reader who is not already acquainted with 
the tests may refer to them, some description of them may be 
of value. 

The Visual Vocabulary Test, Beading Scale A, given in 1916, 
consists of forty-three words, beginning with five easy words of 
equal diflSculty and progressing by steps of five-word groups of 
increasing difiSculty to the last three words which are the most 
difficult of all. This test, built on the ''checking by class" prin- 
ciple, requires that the pupil write the letter P under every word 
meaning a flower, T under every word meaning something about 
time, and so on through the eight kinds of words composing the 
test. The Visual Vocabulary test given in 1917, ''Visual Vo- 

B Woody, 0., M«a9wemnda of Some AehUvements in A,rilthmetiet Teachers Oolleffs 
Oontributions to Education, No. 80. 

6 Woodworth-Wells, Assodaition Tettt, in Payehologieal Monographa, Vol. XIII» 
No. 6, December, 1911. 

TThorndike, "Beading Scale A2, Visual Vocabulary*' In Teaehera OoUege Beeord, 
November, 1916. 

8 Hillegas, M. B., A Scale for the Mfioaurement of Qudttty in Englieh Composition 
by Young People, Teachers OoUege. 
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cabulary Scale A2y" plus the four additional steps taken from 
''A2 Provisional Extension," consists of one hundred and 
seventy words. It is an extension and improvement of Beading 
Scale A, but maintains the same obvious purpose, i.e., to measure 
how hard words a pupil can read in the sense of understanding 
their meaning well enough to classify them -under the proper 
headings ; as, an animal, a flower, something about time, etc. The 
time allowed, twenty-five minutes, enabled each pupil to attempt 
to place the correct letter under each word. In scoring, a credit 
of one was given for each word lettered correctly. The number 
of words lettered correctly constituted the score. 

Scale Alpha 2, For Measuring the Understanding of Sen- 
tences, Part II, consists of eight paragraphs of increasing difiS- 
culty. Each paragraph is followed by questions — ^usually three 
or four — and the pupils' ability to understand the paragraph 
is determined by his answers to these questions. In the time 
allowed, twenty-five minutes, aU pupils except the very slowest 
were able to attempt to answer each question. In scoring, two 
was given for a correct answer and one for a semi-correct answer. 

The grade of the English composition on the subject, **How I 
Should Spend Twenty Dollars," was determined by averaging 
the marks given by four to six experienced judges, who in form- 
ing their judgments used the Hillegas Scale. The composition, 
**What I Should Like to Do Next Saturday," was graded in the 
same manner except that there were four judges instead of four 
to six. The time allowed the pupil for writing this composition 
was thirty minutes. 

In giving the Ayres Spelling test the regular teacher pro- 
nounced the words but did not grade the results. Two credits 
were given for each word spelled correctly. 

The Trabue Completion-Test Language Scales B and C con- 
sist of ten mutilated sentences. In each sentence, from the first 
one which is very easy, through the gradually increasing diffi- 
culty of each succeeding step to the last one, which is usually 
beyond the ability of the pupil, the omitted word or words are to 
be supplied. While C is somewhat more difficult than B, either 
test in the opinion of the author. Dr. M. R. Trabue, "measures a 
class fairly well, but both taken together give a more accurate 
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measure of the individual." Scales J and E, consisting of seven 
sentences each, are very much more difiScult, and are more 
equally matched than B and G. However, all four seem to 
measure the same quality, whatever that quality may be. In 
scoring, the correct answers published by Dr. Trabue were fol- 
lowed absolutely. Two credits were allowed for each sentence 
perfectly completed and one credit for each sentence almost per- 
fectly completed. Time : seven minutes for each test. 

The Woody Multiplication Scale, Series A, consists of thirty- 
nine problems. The first problem is as easy as it can be made, 
but there is a gradual increase in difSculty with each succeeding 
one. The Multiplication Scale, Series B, consists of twenty prob- 
lems drawn from Scale A. The Division Scale, Series A, consist- 
ing of thirty-six problems, is constructed in the same way as the 
Multiplication Scale, Series A. Division Scale, Series B, is 
made up of fifteen problems drawn from Division Scale A. In 
scoring, one credit was allowed for each correct answer. Time : 
twenty minutes for Series A and ten minutes for Series B. 

The two lists of twenty words each which compose the Oppo- 
sites Test, make it possible to give two tests, of about equal difiS- 
culty, of the same function. The pupils were required to write 
as rapidly as possible the opposite of the word appearing in the 
printed list. One credit was given for each correct response. 
Time : seventy-two seconds. 

In the Mixed Belations Test, twenty series of three words each, 
with a fourth word missing, were given. The pupil was to note 
the relation of the second word to the first, and then find and 
write down a word standing in the same relation to the third. 
The two lists of ** mixed relations" make possible the repetition 
of the test without any particular interference from learning or 
remeinbering. One credit was given for each word correctly 
supplied. Time : one hundred and twelve seconds. 

The Easy Directions Test makes it possible to find out the 
pupil's ability and speed in understanding and following cer- 
tain instructions. The two tests of approximately equal difl5- 
culty make the repetition of the test possible. One credit was 
given for each instruction correctly followed. Time : eighty-two 
seconds. 
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3. Were the Subjects Typical 6B Boys ? 

The first problem that presents itself is really a preliminary 
one. It is to determine to what extent the seventy-four pupils 
of this study are typical of pupils finishing the 6B grade in New 
York City public schools. Fortunately it is possible to know 
this relationship with a considerable degree of exactness. It will 
be recalled that all the 6B boys, about seven hundred in number, 
in twenty-four class rooms of five New York City public schools, 
were given five tests. In addition, the teachers in each of these 
twenty-four class rooms ranked his or her boys in intelligence 
and in industry. 

It was then possible by comparing the medians, or, in the 
case of spelling, the averages, of the achievement of the whole 
group in the five tests given in these twenty-four rooms of the 
five public schools with the achievement of the pupils of that 
group who came to Speyer School, and thus know, on the basis 
of these tests, to what extent the Speyer pupils concerned in this 
experiment were typical pupils. By reading Table A under the 
headings "6B Boys — Five Schools," and **Boys who came to 
Speyer School," it will be seen that the Speyer group is slightly 
superior ; it achieved about one-third of one point more than the 
larger group in Trabue B, one-fifth of one point more in Trabue 
C, about one and one-half points more in Woody Multiplication, 
about two points more in Composition as represented by the 
average of the grades given by judgment of from four to six 
judges who used the Hillegas Scale, and about five and one-fifth 

TABLE A 
compabison of median achievements 

6b boys boys who 74 boys 

FIVE schools came TO SPEYEB OF THIS STUDY 

Ca8e8 Median Cases Median Cases Mediae 

Trabue B 684 12.78 171 13.17 74 13.41 

Trabue C 677 12.68 167 12.77 74 12.06 

Woody X 707 31.73 170 33.30 74 33.31 

Composition 694 30.39 164 32.36 74 33.9 

Average Average Average 

Spelling 704 89.02 171 94.21 74 93.9 

points more in spelling the fifty words of the Ayres Q list. It 
is noted then that the Speyer group is, on the basis of achieve- 
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ment in these five tests, somewhat better than the other group, 
though only slightly better. It should also be pointed out that 
the Speyer group did not cluster around the median of achieve- 
ment and that there were all kinds of pupils, from the brightest 
to very nearly the dullest. On this point the estimates of the 
twenty-four teachers are in accord with the tests. 

The ranking of his or her boys for intelligence and industry 
by each teacher of the twenty-four class rooms, does not make 
an accurate comparison of these pupils possible. Since there is 
no way of comparing the subjective estimate of one teacher con- 
cerning one pupil with a like estimate of another teacher of 
another pupil, such evaluations have worth in rough groupings 
only. However, the ranking for industry and for intelligence by 
each teacher of his or her own pupils when compared with the 
ranking by achievement in the five tests, was possible. A study 
of the comparative ranking as made by fourteen of these teachers 
— selected at random — ^with that made by the five tests is shown 
in Table B. Here, for example, teacher number one, who ranked 
pupils practically the same for intelligence and for industry, has 
a fairly high correlation, .76 (Pearson formula), between intel- 
ligence and industry combined, with the composite of the five 
tests, while teacher number ten finds little relation between intel- 
ligence and industry, .29, and a correlation of only .38 between 
intelligence and industry combined, with the composite of the 
tests. A glance at the medians is sufficient to show that the 
easily checked-up abilities represented by multiplication and 
spelling have, as a rule, a much closer relation to the teacher's 
estimate of intelligence and industry than have the abilities 
measured by English composition and the Trabue Completion- 
Test Language Scales B and C. However, when the composite 
of aU the tests is considered, the relation between these teachers' 
ranking for intelligence and industry combined and a composite 
of these fiv§ tests varied from a correlation of .17 to one of .76, 
with a median of .38 and an average of .48. On account of the 
ranking of the different groups by different teachers, with no one 
pupil ranked by any two teachers, it is impossible to present in 
any one statistical statement the exact relation between the rank- 
ing of the Speyer pupils by the teacher and the ranking by the 
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TABLE B 

Tboc Relation op Onb Teacher's Rankino to That by 

Standardized Tests 



Teacher No. 1 . . . 
(87 Pupils) P.E 



Teacher No. 2 . . . 
(88 PupUb) P.E 



Teacher No. 8 . . . 
(27 PupUs) P.E 



Teacher No. 4.. . 
(42 PupUs) P.E 

Teacher No. 5 . . . 
(48 Pupils) P.E 

Teacher No. 6 . . . 
(40 Pupils) P.E 

Teacher No. 7 . . . 
(37 PupUs) P.E 

Teacher No. 8 . . . 
(40 Pupils) P.E 



Teacher No. 9 . . . 
(87 PupUs) P.E 



Teacher No. 10 . . 
(48 Pupils) P.E 

Teacher No. 11. . 
(40 Pupils) P.E 

Teacher No. 12 . . 
(41 Pupils) P.E 



Teacher No. 13 . . 
(82 Pupils) P.E 



Teacher No. 14 . . 
(37 PupUs) P.E 



Average 
Median 
Range . 



t 


' to 


tS 


•2S 


si 

3^ 


lis 


^5 


2^ 


a 


a "H 


s 0«« 


« «M 


•tS 


55! 


•Si^ 


--! 


A'i 


AUS 


^6^ 


5-fS 


1.00 


.76 


.75 


.76 




(.05) 


(.05) 


(.05) 



.77 
(.05) 

.83 
(.06) 

.81 
(.03) 

.95 
(.01) 

.90 
(.02) 

.72 
(.05) 
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.37 
(.09) 

.29 
(.11) 

.83 
(.03) 

.91 
(.02) 

.67 
(.07) 

.72 
(.05) 

.77 

.82 

1.00 — 
.29 



.75 
(.05) 

.53 
(.09) 

.61 
(.07) 

.61 
(.06) 

.49 
(.07) 

.86 
(.10) 

.44 
(.08) 

.41 
(.09) 

.53 
(.07) 

.41 
(.08) 

.37 
(.09) 

.46 
(.09) 

.13 
(.10) 

.49 

.51 

.76 — 
.13 



.65 
(.06) 

.67 
(.07) 

.46 
(.08) 

.53 
(.07) 

.54 
(.07) 

.30 
(.10) 

.45 
(.08) 

.32 
(.10) 

.21 
(.10) 

.34 
(.10) 

.35 
(.09) 

.32 
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.20 
(.11) 

.43 

.40 

.75- 
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pupils' achievement in five standardized tests. However, it has 
been shown by the tests that the group coming to Speyer School 
made a little, but just a very little, better scores in the tests than 
did the other pupils in the twenty-four class rooms in the five 
schools from which they came. 

The next step is to see how the seventy-four boys who form 
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the basis for this study compare with the whole group that came 
to Speyer School. This can be done by comparing their scores 
in the five tests with the scores of the whole group tested in the 
twenty-four class rooms, or with the scores of all those who came 
to Speyer School, or with both. By a study of Table A, it will 
be seen that the seventy-four boys are slightly inferior to the 
whole group at Speyer School, and a very little better than the 
whole group from the five schools. The answer, then, to the 
first problem is that the achievement of the group of seventy-four 
boys studied, as shown in the results of the five tests, is a little, 
but a very little, above the average or the median achievement 
of all the boys in the twenty-four classes of the five schools from 
which they came. 

4. The Scores 

In order to rank the seventy-four boys, for their achieve- 
ment in the eleven tests, it was necessary to find a single star 
tistical statement that represented this achievement. On the 
basis of the scores that resulted from the tests of February, 1916 
(see Table Y on page 51), each individual could be ranked 
in each test. The pupil making the highest score was ranked 
one, and the pupil making the poorest, seventy-four. When each 
individual had been ranked in each subject, his rankings in the 
eleven subjects were added, and the result was a column of totals 
representing the combined rankings of each individual in all 
tests. The rankings of these totals resulted in a single statistical 
statement of each individual's achievement in relation to the 
achievement of the seventy-three others of the group. 

The eleven tests were considered at first of equal value. 
While all of these tests had been used before, so far as the 
writer knows this combination of them in testing pupils of this 
age, for this purpose, had not been made. Each of the tests 
had shown in previous experiments a positive correlation with 
desirable traits, otherwise it would not have been used; but 
the relative value of the tests for purposes of practical edu- 
cational prognosis was largely untested. Any weighting given 
to any one of these tests in the beginning of this experiment 
would have been largely guesswork. The guess made here was 



The Experiment IS 

that any one test was equal to any other test, and the pupils 
were ranked on that basis. The value of each test for the pur- 
poses of this study is considered later. 

It is possible to correlate the ranking that resulted from the 
eleven tests with ranking by age, by grades made during the 
first six years of public school attendance, by grades made the 
first year at Speyer School, and with the ranking by four 
teachers of academic subjects after teaching the pupils one year. 

5. Age 

Age, if taken in years and months and carefully checked, 
is definite. Following the studies of T. L. Kelley, McCall and 
others, a negative correlation between age and achievement was 
to be expected. Hence the youngest pupil was ranked one and 
the oldest seventy-four. Working on the assumption that with 
pupils in the same grade the younger pupil is the brighter one, 
there is a positive correlation with all desirable traits measured 
in this study. The correlation with all school marks for the six 
years before coming to Speyer School is .57 (Pearson formula) ; 
with the composite of the eleven tests, 1916, it is .21, while with 
the composite of 1917, it is .23; with the school marks the first 
year at Speyer it is .34, and with the ranking of the teachers at 
the end of one year, .30. There is, of course, nothing startling 
about these correlations. It is to be expected that the brighter 
a pupil, the quicker he will get to junior high school or to any 
other desirable objective point in his school career. While age 
was not considered in the original grouping of the pupils in 
this study, it is evident now that it could have been used with 
possibly some profit. Since the correlation of age with the com- 
posi,te of the eleven tests — 1916, .21, and .23, 1917 — ^is lower 
than the correlation of age with previous school marks, .57, or 
with marks at Speyer, .34, or with teachers' ranking at the end 
of one year, it seems to follow that these tests are a less effective 
measure of mental ability than the judgments of teachers, or 
it calls in question T. L. Kelley 's statement that **the use, as a 
measure of intelligence, of the age at which a pupil reaches a cer- 
tain grade gives the brighter pupil but a part of the credit due 
him." Otherwise, why is the correlation between youth and the 
tests not as high as that between youth and school marks? 
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6. School Marks 

If school marks represented some definite achievement the 
difficulty of knowing their worth would be greatly simplified. 
While it can be shown by everyone who cares to try the experi- 
ment that in marking any paper there is much less agreement 
than is desirable in the subjective judgments of a group even of 
experts, yet aside from objective measurements school marks are 
one of the best standards we have of mental ability. In this 
study there has been an attempt to refine the value of marks. 
The teachers of these pupils have held that if it is desirable to 
hold a fast-moving class up to x quality in efficiency, it is like- 
wise desirable to bring a slow-moving class as near as possible 
to the same degree of efficiency. 

It seemed, therefore, since the groups move at different speeds, 
that a mark of B in Group 1 was not the same thing, when quan- 
tity as well as quality of work done is considered, as the same 
mark in Group 6. To equalize the difference caused by the 
more rapid work of the faster groups, the school marks were 
turned into figures, and the percentage of the original mark 
represented by the fraction of a school year that a class was 
ahead or behind the expected speed — ^usually the work of the 
middle groups — ^was added to or subtracted from the mark. 
However, this treatment did not disturb greatly the ranking 
made by the unweighted marks. The correlation between the 
weighted and unweighted marks was .94. 

Since all the teachers at Speyer taught together, were under 
the same supervision, and at teachers' meetings frequently dis- 
cussed the meaning and distribution of marks, made up and 
put into use a form of report card of their own, it was not 
difficult to turn the letters given as school marks into figures. 
In considering the marks made by the seventy-four pupils in 
the six years in many public schools under a great number of 
different teachers before coming to Speyer, it was but natural 
that the difficulty of turning the different school marks into 
figures was greater. To overcome this difficulty, various teach- 
ers from different public schools in the neighborhood of Speyer 
were asked to translate into figures the letters used in marking. 
The median value of a letter as found by this investigation was 
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used in translating the marks of the first six years into figures. 
With this preliminary work done, the marks made the year 
previous to coming to Speyer were correlated with the academic 
marks made the first year at Speyer. The correlation was .42. 
When all marks that the pupils had made before coming to 
Speyer were correlated with all marks made in academic sub- 
jects during their first year there, the correlation was raised to 
.49. However, the correlation between the composite of the 
eleven tests given for the purpose of classifying the pupils when 
they entered the school and the marks in academic subjects 
made by these pupils during this first year in the school, was 
higher still. This correlation was .57. The tests, then, were a 
better means of prognosis for these pupils when they entered 
Speyer than were all their previous school marks. The same 
superiority of the tests over the marks for the first six years is 
shown when these marks and the 1916 tests are compared with 
the teachers' ranking after teaching the pupils one year. The 
correlation between the marks for the six years before coming to 
Speyer and the ranking by four teachers after teaching the pu- 
pils one year was .50, while that between the 1916 tests and the 
teachers' ranking was .66. 

7. Transfers 

A still more practical evaluation of the accuracy of the or- 
ganization into homogeneous groups can be arrived at by con- 
sidering the transfers made by the teachers during one year. 
It will be recalled that a group or class, due to the size of the 
class rooms, contained only twenty-five pupils in the beginning, 
and that it was necessary to maintain about that size class ; also 
that when the teachers of a pupil considered that he was in too 
slow or too rapid a group, they transferred him to a slower or 
faster class. By the final placing of the pupils as represented 
by the ranking of the teachers at the end of one year, ten pupils 
were transferred twenty-five places or more from that assigned 
them by the eleven tests given one year previously. Had the 
classes or groups contained thirty pupils instead of twenty-five, 
this would have been still smaller than the usual classes in New 
York City junior high schools or intermediate schools. With 
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classes of thirty there would have been only five displacements. 
If it is kept in mind that the correlation between school marks 
and the composite teachers' ranking is .90, it is evident that 
these teachers did not as a rule make any great distinction be- 
tween school marks and general mental ability. For example, 
pupil number 4 as marked by the original tests was out of school 
for three weeks on account of illness, and pupil number 7 was 
out for a month with an operation for appendicitis. These two 
pupils were placed much lower than ranks 4 and 7 by the teach- 
ers at the end of one year. The purpose is not to dwell on 
what might have been had illnesses been unknown and teachers 
omniscient, but to point out (1) that the tests foretold more 
clearly than did all previous school marks the academic success 
that the pupils would make at Speyer; (2) that, by the final 
ranking of the teachers, there were only ten displacements of 
twenty-five places or more; and (3) that had the school classes 
or groups contained thirty pupils there would have been only 
five displacements. 

8. Teachers' Rankings 

It is not supposed that the judgment of a teacher, even after 
teaching a pupil for one year, is one hundred per cent perfect. 
If teachers' judgments were absolutely accurate, there would 
be perfect correlation between teachers 1, 2, 3, and 4. (Table 
C) Instead of perfect correlation, however, the correlations 
between teachers' rankings vary from .87 to .45, while the aver- 
age of the correlations of teachers 1, 2, 3, and 4 with the other 
three is .69, .67, .53, and .55. Since teacher number 1 has the 
highest average correlation, .69, with the other three teachers 
and also the highest correlation with the tests, .67, for 1916 and 
.73 for 1917, there is evidently some ground for the belief that 
this teacher's judgment of the general mental ability of the 
pupils is more accurate tlian that of any other of the four teach- 
ers. In the same way, since teacher number 2 has an average 
correlation of .65 with the others and a correlation of .65 with 
each of the composites of the tests, this teacher can be justly 
ranked as second. If the rankings of the pupils by teachers 1 
and 2 be combined and reranked on the basis of the totals, the 
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correlation of the combined judgments with the composite of 
the 1916 tests will be .69 and with the 1917 tests, .72. The com- 
bined judgments of teachers 1, 2, 3 have a correlation with the 
composite of its 1917 tests of .73. However, when teacher num- 
ber 4 is introduced, the correlation falls to .68. This combina- 

TABLE C 
The Relation of One Teacher's Rankino to That of Another 
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Teacher No. 1 .87 .58 .61 .67 .73 

Teacher No. 2 87 .56 .58 .65 .65 

Teacher No. 3 58 .56 .45 .47 .54 

Teacher No. 4 61 .68 .45 .45 .32 

Ck)inposite of Tests, 1916 .67 .65 .47 .45 

Composite of Tests, 1917 .73 .65 .54 .32 

tion of teachers' rankings and the relation of the resultant 
ranking with the ranking by the tests, is emphasized here to 
show that teachers' judgments do vary and that if the teachers 
had been selected, slightly higher correlations would have been 
found. It will be remembered that the judgments used were 
those of all teachers who had taught all of these boys in academic 
subjects. While the stressing of this point is of little impor- 
tance, the fact is to be noted that the correlation of the com- 
posite of the teachers' judgments of pupils with the composite 
of the eleven tests of 1916 is .66, and with the composite of the 
1917 tests the correlation is .68 ; also the fact that teachers 1, 2, 
3, and 4 correlate with the 1916 tests, .67, .65, .47, and .45, ,and 
with the 1917 t^ts, .73, .65, .54, and .32. These facts make it 
dear that the teachers individually really agree with the rank- 
ing by tests as well as they agree with each other. This addi- 
tional point should be noted, — ^that the correlation of the com- 
posite of the four teachers' judgments with the composite of 
the tests is decidedly higher than the average of their corre- 
lations with each other. 
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9. The First Question Answered 

The first question proposed in this study was: **To what 
extent is the attempt at educational prognosis made on the basis 
of eleven certain standardized educational and psychological 
tests in agreement with the judgment of four teachers after 
teaching the pupils tested for one year?'' This question has 
now been answered. With temporary illnesses and all the vary- 
ing interests that come in one year to the boy of twelve or thir- 
teen, in the opinion of four teachers at the end of one year only 
ten pupils, on the basis of twenty-five in a class, had been orig- 
inally placed in too high or too low a class, and, on the basis of 
thirty in a class, only five. Further, the success, as represented 
by school marks, of seventy-four boys just a very little better 
than the median pupil in twenty-four 6B class rooms of five 
New York City public schools, was more accurately predicted 
by eleven standardized tests than by all the pupil's previous 
marks combined. 



CHAPTEK III 

FOE THE PUKPOSE OF EDUCATIONAL PKOGNOSIS, 
WHAT TESTS ABE OF MOST VALUE ? STANDARDS 

With the first problem concerning the possibility of making 
an educational prognosis by means of standardized tests an- 
swered, in so far as the data of this study permit, there arises 
the question of evaluating these tests for the purpose set forth 
in this problem. Which of these tests, how many tests, and 
what combination of them must the practical administrator 
give in order to arrive at as good results or even better than 
those reached in this study? It is not maintained that eleven 
tests are a sufficient number ; the more measures of equal value, 
the better. Neither is it maintained that tests that can be 
given to a whole group at one time are more accurate in making 
a diagnosis of the pupil than are tests which can be given to 
only one subject at a time. The complete study of a single in- 
dividual would occupy a lifetime. However, the problem here 
is to select from the eleven tests used those tests which, with 
due regard to economy of the pupils' time and ease in scoring, 
the administrator can use in organizing, for purposes of in- 
struction, the entering classes in the junior and senior high 
schools. 

Seven standards are proposed for evaluating these tests : 

1. The correlation of a test with itself, or with a similar test, 
repeated with the same pupils one year after the first test 
is given. 

2. The correlation of each test with the composite of the 
eleven tests. 

3. The correlation of each test with each of the other ten 
tests separately. 

4. The correlation of each test with the judgments of four 
teachers after teaching the pupils for one year. 
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5. The correlation of each test with all the school marks the 
pupil made during his school life before he reached the 
junior high school. 

6. The correlation of each test with the school marks in all 
academic subjects during the first year in junior high 
school. 

7. The correlation of each test with the age of the pupil. 

Since all the tests, either the same or similar ones, were re- 
peated in the present study, all of the standards proposed above, 
with the exception of number 1, have been worked out twice. 
In addition to this repetition, in order to correct each test for 
attenuation, each 1916 test was correlated with each 1917 test. 
At this point, when each of these seventy-four pupils had partici- 
pated in several hundred correlations, certain tests were selected 
as being the best for the purposes of this study. These tests, 
as will be seen, are further correlated and combined so that the 
administrator may know the degree of efficiency he may expect 
for the number of minutes invested in measuring the pupil. 

It is not maintained, of course, that all of these standards are 
of equal value. For example, it is possible that standard 1 may 
be of slight value. The justification of correlating each test with 
age may be called in question. However, in every case in this 
study, which includes only 6B pupils, when the youngest pupil 
has been ranked 1, and the oldest 74, there has been a positive 
correlation between youth and the ranking by tests, by school 
marks, or by teachers' ranking. Standard 3 probably should 
not be considered of too great value if it were too much opposed 
to standard 2. The question may be raised as to the reason for 
not making more use of the coefficients that result from the cor- 
rection for attenuation. The value of the elimination of chance 
error as represented by this process has not been overlooked, but 
it is believed for the purposes of the present study that it is 
safer to depend on what the administrator in using tests will 
have to depend on — ^the raw coefficients. One further question 
is considered. When the best one, two, or three or half dozen 
tests have been selected, it is possible to find out what would 
have been the result if these tests and these only, instead of the 
eleven tests, had been used in making the original prognosis. 
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1. The Tests Bepeated 

As has been pointed out, the tests given in February, 1916, 
were repeated one year later, February, 1917. Of these 1917 
tests, it will be recalled that one test, Beading Alpha 2 : Part II, 
was the same test, that the Woody tests of 1917 were the same 
as the 1916, but that only about half as many problems were 
used. The other eight tests while not identical were, as is 
pointed out on pages 9 and 10, similar. In correlating the score 
of each 1916 test with its corresponding test in 1917 (see Tables 
Y and Z, pages 51-53) the Spearman method was used. Ac- 
cording to the formula 

— 1 — ^^^"^ 
^■" n(n2 — 1) 

p is the measure of the correlation, n = the number of paired 
related measures, D = the difference in rank of the subject in 
the measures correlated, and SD^ = the sum of the differences 
squared, or, to put it more concisely, 5D^ = **the sum of the 
squares of the differences between the two numbers denoting 
the relative positions of the two related measures in their re- 
spective series." Since it is necessary to have the coefficient in 
terms of r, the Pearson formula, in order to employ the formula 
for correction for attenuation, the coefficients worked out by 
the Spearman method have been in every case transmuted, ac- 
cording to the table ^ for inferring the value of r from any given 
value of py into coefficients in terms of the Pearson formula. 
The reliability of the coefficients derived is, of course, depend- 
ent on the number of cases. In this study it will be recalled 
that there are seventy-four pupils or cases. The P.E., then, as 
the ** median of the differences between the separate measures 
and their central tendency," shows the measure of reliability. 
By the formula, 

^/n 
n = the number of cases and r = the coefficient of correlation. 

iThomdike, E. L., Mental and Social Measurements, Table 36, p. 168» 
1013 edition. 
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In order to be sure to have the same response from two situ- 
ations, the stimulus and the situations must be exactly the same. 
As applied to the tests repeated here, **the same response '* 
would mean perfect correlation between the 1916 and 1917 tests. 
The correlation, however, is by no means perfect. Some of the 
factors that keep one from expecting too high a correlation be- 
tween the two tests demand consideration. The tests repeated, 
as we have seen in the portion of this study devoted to a dis- 
cussion of the tests given, while always similar, were not always 
the same ones. Trabue B and C undoubtedly measure the same 
abilities as Trabue J and K, yet the tests are by no means of the 
same difficulty. While the directions given to pupils, when 
Eeading Alpha 2: Part II was given in 1917 were the same 
as those given for this test in 1916, no one can be sure that 
the mental set of the pupils was the same. In fact, one 
can be sure that it was not the same. In 1916 these pupils 
were not accustomed to taking tests of this kind, while a year 
later they counted it a dull day that they did not have a chance 
to measure themselves by some objective standard. Then, too, 
while the pupils tested were the same ones, a whole year had 
elapsed between the tests and their repetition. Individual dif- 
ferences that existed in 1916 had, as a result of opportunity for 
practice and especially practice in groups that stimulated one to 
progress at one's optimum speed, increased rather than equal- 
ized these differences. While it is not the object of this study 
to stress the enormous changes that took place in one year in 
boys twelve and thirteen years of age, yet, in considering the 
correlation between the 1916 and 1917 tests, it is necessary to 
recognize that great changes at this period are possible. There 
are in all probability errors other than those which correction 



What Tests are of Most Value for Ediicational Prognosis 25 

for attenuation can eliminate. This point, however, will be dis- 
cussed later. The fact that the year elapsing between tests and 
their repetition brought changes in the pupils tested, that the 
mental set of these pupils was different, and that the tests were 
not in all cases exactly the same, must be taken into account in 
considering the correlations presented in Table D. 
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It is not possible to compare accurately these raw coefficients 
of correlation with the work of any other investigators of whom 
the writer knows, for in other cases either these tests have not 
been given, or they have not been given a year apart, or to boys 
of twelve and thirteen. As they stand here, if Trabue B is 
paired with J, and C with K, the tests yank, in degree of corre- 
lation, according to the figures in parentheses following the co- 
efficients of correlation in the total. 

The worth of this standard is problematical. Should a test 
repeated with the same pupils after one year have a high cor- 
relation with itself? For example, English Composition had 
not received nearly the emphasis in the first six grades of the 
public school that it did in the year between these tests. If 
each pupil had improved, say twenty per cent, during the year, 
the ranking in composition by the 1916 test would not have been 
disturbed; but the improvement in each pupil's case was not, in 
comparison with the other pupils, a certain per cent of his orig- 
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inal ability. If it had been, the correlation when corrected for 
attenuation would have been more nearly perfect. Such, how- 
ever, as will be seen later, is not the case. It is possible that if 
this subject, English Composition, had received still more em- 
phasis during the time between the tests, the correlation between 
the two tests would have been still lower. While the pupils 
had had much practice in composition, the opposite is true re- 
garding the type of test represented by Visual Vocabulary. To 
be sure, the pupils had been learning new words, but they had 
had no practice in writing F under a word that means a flower, 
or T under a word indicating something about time, yet the 
correlation in the case of Composition is .32, and in Visual Vo- 
cabulary, .56. Spelling had received great attention during the 
first six years and the study of this subject was continued dur- 
ing the year between the tests, yet the correlation is but .52. 
Differences in tests, in mental set, in physical condition, in the 
lapse of one year, furnish some of the explanations of the low 
correlations, and at the same time call in question the worth of 
this standard in determining the value of a test for purposes of 
prognosis. 

2. The Correlation op Each Test with the Composite 

The second standard proposed is the correlation of each test 
with the composite of the eleven tests. The method of doing 
this has already been explained. Exactly to what extent the 
various tests measure different mental traits it is not yet pos- 
sible to say. However, since all tests used have been found to 
correlate positively with desirable mental abilities for academic 
work, it seems fair to assume that all the tests as combined in 
the composite give a more accurate evaluation of the pupil's 
general mental ability than does any one of these tests singly. 
Therefore, it follows that the correlation of each test with the 
composite furnishes some means of evaluating the relative merits 
of the different tests. Since the tests have been repeated, a pos- 
sible check on the value of a test as determined by its correla- 
tion with the composite, is furnished. If the possible reasons 
for causing the high or the low correlation of a test with itself 
when repeated, are held in mind, the comparison of the corre- 
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lation of each test in 1916 and in 1917 with its composite will 
be of value. 

It wiU be noted in Table E that with the exception of Oppo- 
sites and Easy Directions, the tests occupy somewhat the same 
relative positions in 1916 and in 1917. Visual Vocabulary, for 
example, which rapked 1 in 1916, becomes 3 in 1917, Beading 
changes from rank 3 in 1916 to rank 4 in 1917, and so on. If 
the ranks for each test for each year are added and these totals 
ranked, and if the correlation of each test with its composite in 
1916 be added to its correlation with its composite in 1917, and 
these totals ranked, and then these two totals added and ranked, 
the tests will stand in the relative positions expressed by the 
column of figures in parentheses in Table E. 

TABLE E 

COBBELATION OP EACH TEST WiTH ItS GoMPOSITB 

1916 1917 Rcmk 

Visual Vocabulary 73 .69 ( 1 ) 

Reading 63 .67 (2) 

Composition .51 .50 (8) 

Spelling 53 .54 (7.5) 

Trabue B .45 J .63 (7.5) 

Trabue C .59 K .65 (3) 

Trabue B and C 65 J&K .76 

Woody Multiplication 26 .36 (9) 

Woody Division 26 .30 (10) 

Opposites 49 .70 (4) 

Easy Directions 58 .52 (5.5) 

Mixed Relations 55 .54 (5.5) 

It will be observed that while it is possible and probably just, 
in considering the Trabue tests, to pair B with J and C with K, 
yet it might have been wiser in the beginning to have combined 
B and C and J and K. This doubling Jhe length of the test 
makes these completion tests take a relatively higher position. 
It is apparent from the correlations, according to the standard 
considered here — ^the degree of correlation of a test with its 
composite, — ^that Visual Vocabulary, Eeading, Opposites, and 
Trabue Completion are the four tests of greatest value for pur- 
poses of educational prognosis. 

There is, however, a source of error in all these correlations 
which makes them higher than they should be ; this is especially 
true of the combination made of the Trabue tests. The compos- 
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ite is composed of eleven separate tests, and when any one of 
these tests is correlated with this composite, it is, to a certain 
extent, correlated with itself. Since the Trabue tests B and C, 
likewise J and K, enter into the make-up of the composite as two 
separate tests, they have, when combined, a double interest in 
the composite. Statistically, it would be equally just to com- 
bine Visual Vocabulary and Beading. This combination has a 
correlation with the composite of .76 in 1916 and .74 in 1917. 
Since eleven tests enter into the composition of the composite, 
each test would seem to have an interest of one-eleventh, and 
when two tests are combined after the composite has been made 
up, as was done with Trabue B and C and also with J and K, 
such a combination would have an interest of two-elevenths in 
this composite. Investigators as a rule have not m^de any cor- 
rection for this correlation of a test with the composite of which 
it is a part. Manifestly, however, such a correction is of value. 
One of the ways of making this correction that suggests itself 
is to make a composite of ten tests and find, the correlation be- 
tween this composite and the eleventh test. This method is not 
only exceedingly laborious but evidently partly unjust. The 
test withdrawn from the composite in order to correlate it with 
the other ten tests, measures some phase of general mental abil- 
ity, and with this test withdrawn the composite is proportion- 
ately less perfect. However, this method has been followed. 
Each of the eleven tests has been correlated with the composite 
of the other ten. 

The results of the correlation of every test with the composite 
of the other ten tests, as presented in Table F, show that the 
correlations as presented in Table E have been reduced about 
.16 in 1916 and about .13 in 1917. The reductions, when this 
method is used, for each individual test as shown by the figures 
in parentheses in Table F, vary in 1916 from .13 to .21, and in 
1917 from .07 to .18. It will be noted also that this reduction 
is not a fixed percentage of the original correlation, the correla- 
tion of the test with the composite of the eleven tests, but that 
the amount of reduction has a slight tendency to be larger when 
the original correlation is smaller. This correction for the cor- 
relation of a test with itself does not materially affect the order 
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of tests as ranked in Table E. However, it does raise again the 
question as to the extent of common elements in the various tests. 

TABLE F 

COBBXLATION OF CIaCH TEST WiTH THE COMPOSITE OF AlL 

Tests Except Itself 

1916 1917 

ViBual Vocabulary 60 ( .13 ) .56 ( .13 ) 

Reading 47 (.16) .56 (.11) 

Composition 35 (.16) .36 (.14) 

Spelling 40 (.13) .37 (.17) 

Trabue B .29 (.16) J .50 (.13) 

Trabue C .45 (.14) K .53 (.12) 

Woody Multiplication 09 (.17) .22 (.14) 

Woody Division 06 (.20) .17 (.13) 

Opposites 28 (.21) .63 (.07) 

Easy Directions 42 (.16) .41 (.18) 

Mixed Relations 37 (.18) .41 (.13) 

3. The Cobrelatign op Each Test with Eveby Otheb Test 

The correlation of each test with every other test proceeds on 
the assumption that each of these unweighted tests is equal to 
any other test. By the combination of the seven standards set 
up for evaluating a test, this, as is pointed out later, is found to 
be untrue. Such value as the correlation of each test with every 
other test has, is not to be neglected; but if this standard is in 
conflict with the corrected correlation of each test with its com- 
posite as presented in Standard 2, its value would certainly be 
questionable. 

By Tables G and H it is possible in addition to knowing the 
correlation of every test with every other test and the relations 
between these correlations for the 1916 and 1917 tests, to know, 
also, how this standard of the average correlation of each test 
with the ten other tests composing its composite, when the 1916 
and 1917 averages are combined, compares with the second 
standard set up — ^the correlation of each test with its composite. 
In combining the 1916 and 1917 tests by ranking the combined 
totals of the ranks arrived at, first, by ranking each test accord- 
ing to the total correlations with every other test in both 1916 
and 1917, and, second, by ranking the totals given by adding 
the ranks of each test in 1916 and 1917, it is found that the 
tests according to this standard stand in the following order with 
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the highest first: Visual Vocabulary, Beading, Opposites, 
Trabue C-K, Easy Directions, Mixed Eelations, Spelling, Trabue 
B-J, Composition, Woody Multiplication, and Woody Division. 
By comparing this ranking of tests just given in the order of 
their importance for educational prognosis with the correspond- 
ing order arrived at by the correlation of each test with its com- 
posite, Table E, it will be seen that the order of the tests is 
almost the same. 

TABLE G 
Each 1916 Test With Eveby Other 1916 Test 
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The determination of the value of a test for educational prog- 
nosis, however, is not the whole question here; if it were, the 
whole problem would be greatly simplified. The question is not 
only what tests are best for the purposes of prognosis, but what 
combination of tests is desirable. If every test in Tables 6 and 
H had a correlation of +!• with every other test, there would 
be no need of giving eleven tests; one would do as well as all 
combined. In such a case it would be evident that all tests had 
measured the same function. Visual Vocabulary, Eeading, and 
Completion Tests tend to measure at least closely related func- 
tions, as is shown by their correlations with each other. A test 
that has shown positive correlation with desirable traits and 
has a low correlation with every other test, evidently measures 
a function not measured by these other tests. This accounts 
for the negative correlation of the Woody tests in Tables 6 and 
H. In measuring a group it is, of course, desirable to measure 
as many traits as possible. Thus a test that measures traits 
not closely related to those measured by the other tests will have 
a low correlation with the other tests, and at the same time be 
the test that should be included in the combination of tests used 
for the purpose outlined in this study. On this basis the test 
with the lowest correlation in Tables 6 and H has been ranked 
one, and the test with the highest correlation, eleven. 

4. The Correlatign op Each Test with the Judgment op 

Four Teachers 



y 



As has been pointed out, the criterion of prognosis in this 
experiment had to rest in the teachers' judgments. It is not 
believed that these judgments are always correct. In fact, one 
can be sure that some of them at least are incorrect, for the 
average correlations of each of the four teachers' judgments with 
those of the other three, as has been pointed out, are .69, .67, .53, 
and .55 instead of +1., as they would be if the teachers were 
omniscient. However, such virtue as lies in this study in spite 
of such imperfections as may exist in material, method, or indi- 
vidual judgments, is due largely to the fact that it is a study of 
a practical working experiment. Since such is the case, teacher- 
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judgments are accepted as they are without theorizing as to what 
they might be. 

It will be recalled that the correlation of the Teachers' Bank- 
ing with the composite of the 1916 tests is .66 and with that of 
the 1917 tests, .68. It is not to be expected that any one test 
will reach as high a correlation with the Teachers' Banking as 
the composite of all the tests unless some of these tests are 
worthless, for the purpose considered here, or worse than worth- 
less, or that some of the tests are of such a compound of many 
tests as to measure a very great number of mental traits that 
correlate positively with those mental traits that make for aca- 
demic success. Each of the eleven tests, for both 1916 and JL917, 
as shown in Table J, has been correlated with Teachers' Bank- 
ings. By inspection, those tests which correlate highly with 
Teachers' Banking can be easily picked out. However, to ar- 
rive at a definite statement, some statistical method is necessary. 
By ranking each test for 1916 and for 1917 and ranking the 
totals of these tests, or by ranking the tests by the average of 
the correlation of each test in 1916 and 1917, there is very little 
changing of the relative position and there is no change in that 
of the six highest correlations. By adding the rankings by each 
method and ranking the totals, the tests stand in the order indi- 
cated by the figures in parentheses: (3), (2), (4), (1), etc. It 
Ynil be observed that in this ranking Trabue B and C have been 
combined and also J and K. Thus there are only ten tests. 
However, instead of making this combination, if C had been 
paired with K, and B with J, with the resultant ranking as 
shown by the figures (3), (1.5), (4), etc., the method of ranking 
being the one just explained, the only difference so far as these 
Completion Tests are concerned, is that C-J takes the place of 
the longer tests. The point is often rightly urged that lengthen- 
ing a test tends to raise its correlation. Lengthening a test, 
however, means that it takes more time for the subjects to take 
it, and likewise a longer time for the administrator to score it. 
For theoretic purposes, time is not of so great value; but for 
practical use, if C-K will give as satisfactory a result as will B 
and C and J and K, then according to the standard now being 
considered for evaluating a test, C-K is to be preferred. 
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TABLE J 

CORBELATION OF TESTS WiTH THE RANKING BY FOUB TEAOHSBS 

Rcmk Rank 

Trahue Trabue B 

B a/nd paired 

combined, ijoith J, 

J a/nd K and 

combined with K 

Visual Vocabulary 44 .43 ( 3 ) ( 3 ) 

Reading 47 .47 (2) . (1.5) 

CJomposition 37 .49 (4) (4) 

Spelling 37 .60 (1) (1.5) 

Trabue B 18 J .40 (9) 

Trabue Cv 38 K .24 (6) 

Trabue B and C 37 J&K .36 (6) 

Woody Multiplication 25 .35 (7.5) (8) 

Woody Division 20 .35 (9) ( 10) 

Opposites 36 .50 (5) (5) 

Easy Directions 35 .27 (7.5) (7) 

Mixed Relations 22 .18 (10) ( 11 ) 

It may be recalled that in Table B, where the rankings by 
fourteen teachers, each one ranking his or her own pupils, were 
compared with the ranking by the four tests, Spelling and Arith- 
metic correlated about twice as high with the Teachers' Bankings 
as did Composition and the B and C Completion Tests. An in- 
spection of Table J shows that while the Arithmetic tests do not 
rank so high as in Table B, Spelling leads the list. Eight or 
wrong, the ability that enables a pupil to spell well plays an 
important part in forming a teacher's conception of mental 
ability. In contrast to this important place maintained by spell- 
ing, the Arithmetic tests are here among those that have the 
lowest correlations with Teachers' Bankings. Plainly the order 
of the first half-dozen tests according to the standard now under 
consideration is: Spelling, Reading, Visual Vocabulary, Com- 
position, Opposites, and the Completion Tests. 



5. Relation of Each Test to School Marks during the 
First Year op the Junior High School 

Since the correlation of teachers' judgments and the school 
marks is so high, .90, it seems evident that those boys who do 
their school work well are, in the opinion of the teachers, the 
abler mentally. No such close relation is found between the 
tests and school marks. This correlation for 1916 is .57, and 
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for 1917, .55. Since the two standards, school marks and teach- 
ers' judgments, are both subjective, and since the four teachers 
whose combined judgments determine the rankings of the pupils, 
gave about half the marks, a close correlation between these 
two standards is to be expected. In all probability, a wider 
range of mental traits is represented in teachers' marks than is 
measured in any one of the eleven tests. Thus, while the corre- 
lation between the school marks in the first year of the junior 
high school and the composite of the 1916 tests is .57, and the 
composite of the 1917 tests is .55, the correlation of school marks 
of the first six years with the 1916 tests is .29, and with the com- 
posite of the 1917 tests, .32. As can be seen in Table K, the 
correlations of the individual tests with the school marks during 
the first year of the junior high school range from .43 to .16 in 
1916, median .29, and in 1917, from .56 to .09, with a median 
of .34. As has been pointed out, teachers' judgments and school 
marks are often variable; yet, outside of objective measure- 
ments, they are the best measures of general intelligence that 
we have. It follows, therefore, that school marks should re- 
ceive some consideration in evaluating a test. 

In studying Table K, it will be noted, when all tests are 
considered, that the average correlation of the 1917 tests is 

TABLE K 

COBBELATION OP EaGH TeST, 1916 AND 1917, WiTH SCHOOL MaBKS IN 

Academic Subjects Dubinq the Fibst Yeab op 
JuNioB High School 

1916 1917 Ra/nk 

Visual Vocabulary 34 .32 (5.5) 

Reading 43 .37 (2) 

Composition 32 .38 (3) 

Spelling 32 .56 (1) 

Trabue B 16 J .29 (9.5) 

Trabue C 31 K .09 (9.5) 

Woody Multiplication 28 .39 (5.5) 

Woody Division 27 .34 (7) 

Opposites 26 .47 (4) 

Easy Directions 29 .18 (8 ) 

Mixed Relations 17 .18 (11) 

slightly higher than that of the 1916 tests. There is at the 
same time great variation in the ranking of the tests: Visual 
Vocabulary, which ranked 2 in 1916, is ranked 7 in 1917, Oppo- 
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sites has jumped from 9 to 2, and Beading with a change of 
only .06 in correlation, has dropped from first to fifth place. 
By the method of ranking already explained, the tests stand, 
when school marks are considered as a standard for eval- 
uation, in the order indicated by the figures in parentheses in 
the right-hand column. As will be noted, Spelling holds first 
place, as it did with Teachers' Bankings. Beading is second. 
Composition third, Opposites fourth, with Visual Vocabulary 
and Woody Multiplication tied for the next position. While the 
order is varied, and, with the exception of Woody Multiplication 
taking the place often held by the Completion Tests, the first 
half-dozen tests here are the same as those selected by the pre- 
ceding standards. 

The tests used were selected, as has been pointed out, because 
in previous experiments they had had positive correlations with 
desirable traits as shown by academic success, and because it 
was believed that this general mental ability under right direc- 
tion would express itself in the school work. Hence a positive 
relation was expected between the tests and school marks. If 
the school work to be done had been other than that of an aca- 
demic junior high school, it is conceivable that some other or 
some additional tests might have been selected. 

6. The Correlation of Each Test with All School Marks 

Made during the First Six Years 

Perhaps no absolutely positive statement concerning the pu- 
pil's mastery of the **tool subjects" in the first six grades, as 
usually taught, and his general mental ability, can be made. 
One is certainly justified, it would seem, in believing that the 
pupil with mental ability would master such subjects as the four 
fundamentals in arithmetic and thus rank high according to the 
marks that he received as a result of doing this work well. 
When it is recognized that in this study the correlation of all 
school marks made prior to entering the junior high school with 
all marks made during the first year after entering, is .49, while 
that of the composite of the 1916 tests is .57, and, at the same 
time, that the correlation of the marks for these first six years 
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with the Teachers' Banking at the end of the first year of the 
junior high school is .50, while that of the 1916 tests is .66, it is 
evident that for predicting academic success the tests were su- 
perior to the sum of all previous marks. The recognition of this 
fact calls in question the value of this standard of previous school 
marks for evaluating a test, and especially marks given in so 
many grades of so many schools by so many teachers. 

Since the work of the first six grades probably must be, and 
certainly is, on the "fundamentals," the ranking of the tests by 
this standard of all marks previous to the junior high school is 
not surprising. Thus in Table L, the tests rank, beginning with 
the highest: Spelling, Woody Division, Composition, Trabue 
B-J, Woody Multiplication, with Beading and Opposites tied 
for the sixth place, followed by Visual Vocabulary, Mixed Be- 
lations. Easy Directions, and Trabue C-K. Plainly there is, 
with the exception of Spelling, a marked reversal in the order 
of the tests from what has been found in the other standards. 

TABLE L 

The Cobbelation of Each Test fob 1916 aitd 1917 With All 
School Mabks Below the Juniob High School 

1916 1917 Rwnk 

Visual Vocabulary 15 .24 (8) 

Reading 22 .20 (6.5) 

Composition 21 .29 (3) 

Spelling 40 .36 (1) 

Trabue B 16 J .33 (4) 

Trabue C 13 K .07 (11) 

Woody Multiplication 29 .18 (5) 

Woody Division 19 .37 (2) 

Opposites 16 .28 (6.5) 

Easy Directions 10 .15 (10) 

Mixed Relations —.02 .24 (9 ) 

The high position of Spelling and Arithmetic can be easily un- 
derstood — ^these subjects had received emphasis in the first six 
grades. Certainly the brighter pupils should master the fun- 
damentals of arithmetic better than the dull ones, and likewise 
spell better. However, does the habit of making a fixed response 
to a situation, instead of freeing the mind for other things, in- 
terfere for the time during which the habit is being fixed, with 
meeting entirely new situations f In 1916 and again in 1917, 
the Multiplication tests correlated negatively with Easy Direc- 
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tions. Such tests as Beading and Visual Vocabulary which, ac- 
cording to other standards so far considered, rank high, are here 
in the second division. By this standard of marks for the first 
six years, those tests involving fixed responses rank much 
higher than those involving new situations. Aside from these 
observations, since the school marks under consideration have a 
correlation of .49 with all marks in the first year of the junior 
high school, of .50 with the rankings by four teachers, and since 
all of these marks had a correlation of only .29 with a com- 
posite of the 1916 tests and an average correlation with all the 
tests of only .18, it is not believed that this standard is of much 
value in determining the worth of a test. 

7. The Correlation of Each Test with the Age of the Pupil 

Since retardation and at least comparative acceleration play 
some part in every school system, it is to be expected that within 
a grade the younger pupils have the greater mental ability. 
Hence, as has been pointed out, the youngest pupil has been 
ranked 1, and the oldest, 74. Yet from the data presented in 
Table M, it is seen that there is a very low correlation between 
youth and the tei^ts. The average of the correlations of all tests 
for 1916 with youth is .13, and for 1917, .17, while the correla- 
tion of the composite of all the tests for these years, 1916 and 
1917, with youth is .21 and .23. The bright young pupils evi- 
dently attracted the favorable attention of the various teachers 
during the first six years, for the correlation between the marks 
for the first six years and youth is .57. However, these com- 
paratively accelerated pupils did not succeed quite so well, as 
judged by school marks, during the first year of the junior high 
school. Here the correlation between school marks and youth 
is not .57 but .34, and the correlation with the Teachers' Bank- . 
ing at the end of one year is .04 lower. 

In analyzing Table M, it will be noted that the tests selected 
by the standard of youth are, first of all, those preferred by 
Standard 6 — ^Arithmetic and Spelling. It should be noted, how- 
ever, that Visual Vocabulary jumped from rank 9.5, 1916, to 
2.5, 1917, and Opposites from 11 to 4.5. The data of this table 
might suggest also — since such tests as those just mentioned in- 
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volving situations new to these boys in 1916 were so much better 
met in 1917 — that comparatively these brighter, younger pupils 
adjusted themselves when once the **tool subjects'* had been 
mastered, to new situations more rapidly than the older pupils. 
In any case, youth within a grade does correlate positively with 
the average of the 1916 and 1917 tests and with all teachers' esti- 

TABLE M 

COBBELATION OP EACH TeST WiTH YoUTH WfTHIN A SCHOOL GbADE 

1916 1917 Rwnk 

Visual Vocabulary 04 .27 (7) 

Reading 17 .16 (3) 

Composition 16 — .03 (9) 

Spelling 27 .27 (2) 

Trabue B 12 J .25 (5) 

Trabue C 07 K —.02 (10.5) 

Woody Multiplication 27 .29 ( 1 ) 

Woody Division 19 .08 (6) 

Opposites —.03 .26 (8) 

Easy Directions 04 .04 ( 10.5) 

Mixed Relations 09 .26 (4) 

mates, either marks or rankings. Therefore, youth must be of 
value as a standard for evaluating a test ; but since these corre- 
lations are so low, it is not of great value. Young pupils, so 
far as this study is concerned, stand higher in the sympathetic 
estimates of their early teachers than in the unfeeling ranking 
by objective tests. 

8. Summary of All Raw Coefficients of Correlation 

Before considering the coefiScients corrected for attenuation, 
it will probably be convenient for the reader to have all the raw 
coefiScients for all tests and for all standards set up, presented 
as concisely as possible. At the expense of some necessary repe- 
tition, they are brought together in Tables N and 0. Table P 
presents the average of all correlations compiled from Tables 
N and 0. In these tables, Trabue B, 1916, is paired with 
Trabue J, 1917, and likewise C, 1916, with K, 1917. If C is 
combined with B and J with K, the correlation between B-C 
and J-E is .41. 
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9. GOBBEGTED COEFFICIENTS /OF CORRELATION 

Since chance inaccuracies in the paired measures correlated 
do not render each other harmless but tend to produce zero 
correlation, it is necessary to correct the raw coefiScients. As 
either the same tests or tests similar to those given in 1916 had 
been repeated one year later, the two independent measures 
necessary for this correction for ** attenuation'' due to chance 
errors, are at hand. By utilizing the raw Pearson coefiScients 
of correlation of Table Q, it is possible to present the corrected 
coefiScients in Table R. The formula^ used is 

If Visual Vocabulary and Reading are the measures to be 
related, let A equal the former and B the latter. Let p be a 
series of exact measures of A, and q be the related series of exact 
measures of B. Let r^^ be the coeflScient of correlation of A 
and B, obtainable from the two series p and q. Vp^ is thus, ac- 
cording to this theory that errors are due to chance errors in the 
data, the required true coefiScient. Let Pi and pg ^^ two inde- 
pendent series of measures of A, and q^ and g^g two independent 
series of measures of B. Let Vpi^i he the correlation when the 
first measure of A and the second measure of B are used, and 
^p2fli l>e the correlation when the second measure of A and the 
first measure of B are used. Let PiPg be the correlation be- 
tween the two measures of A, and q^q2 the correlation between 
the two measures of B. Of course a test could be split and the 
odd responses, for example, be correlated against the even, but 
this was not necessary here as the eleven tests were repeated 
after one year. 

In Table R, since some raw coefiScients were either zero or 
negative, there are some coefiScients wanting. Also, since in 
some cases, the PiPg and q^q2 were very low, some corrected co- 
efiScients are 1+* I^ue to this and to the additional fact that 
the practical administrator must depend on raw coefiScients, 
more use has been made in this study of the raw than of the 
corrected coefiScients. 

1 Thorndike, E. L.; Mental and Social MeaswrementB, p. 179, 1913 edition. 
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Vii. Yoeab. .79 .88 

Beading 79 .80 

Oompotition 88 .80 

Spelling 60 .80 .59 

Tnbue B-J 71 .51 .86 

Trabne C-K 1.02* .79 1.19* 

Woody Mnlt 22 .20 .30 

Woody Div. .87 

Oppoiitei 76 .76 1.04* 

EMy Direo 79 .70 .77 

Mixed Bel 52 .68 .85 



10. SBiiECTiON OP Tests 

The present evaluation of tests involves two chief questions: 
First, the evaluation of individual tests for the purpose of edu- 
cational prognosis, and, second, the combination of tests to use 
in such an experiment as this study has recorded. Of the seven 
standards proposed, pages 21 and 22, standards 2 and 4 are 
considered of most worth, and the ranking of the tests as given 
under two and four in Table U is believed to be more nearly 
correct than that of any of the other combinations of standards. 
In every combination of standards presented in Tables U and V, 
Reading, Visual Vocabulary, Opposites, and Spelling come in 
the first division of the whole group of tests. The practical 
administrator can add the Completion and the Arithmetic tests 
to this list of four tests if he desires to extend his testing beyond 
seventy-five minutes. 

If all of these tests correlated +!• with each other, there would 
be no need of giving more than one of them. Evidently such a 
correlation would indicate that the tests measured the same 
traits. Since nearly all of these are language tests, it is to be 
expected that the Arithmetic tests would have a low correlation 
with the composite and a low correlation with every other test. 
This fact that the Arithmetic tests, which have been found to 
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have a positive correlation with desirable traits, do have a low 
average correlation with every other test, probably indicates 
that they measure some abilities that the others do not measure. 
If such is the case, the test that has the lowest correlation with 
every other test should be ranked one, and the test with the 
highest average correlation, ranked eleven. Such a ranking has 
been made in standard 3 of Table S as ranked in Table V. 

TABLE T 
Banking of Tests by All Standabds 

Bank hy Standards : I II III IV V VI VII VIII IX 

Visual Vocab. ... 1 1 1 3 6.5 8 7 10 9 

Beading 2.6 2 2 1.6 2 6.6 3 10 10 

Composition 10 9 9 4 3 3 9 10 11 

Spelling 2.5 7.5 7 1.5 1 1 2 8 8 

Trabue B-J 7.5 7.6 8 9 9.5 4 6 4.6 6.5 

TrabueC-K 11 3 4 6 9.5 11 10.6 4.6 6.5 

Woody Mult 5 10 10 8 5.6 6 1 6.6 4.5 

Woody Div 6 11 11 10 7 2 6 6.6 4.6 

Opposites 4 4 3 5 4 6.6 8 1 1 

Easy Direc 9 6.6 6 7 8 10 10.6 2 3 

Mixed Bel 7.5 6.6 6 11 11 9 4 8 2 



Bank by Standards : 

Visual Vocab. . . . 

Beading 

Composition 

SpeUing 

Trabue B- J 

Trabue C-K 

Woody Mult 

Woody Div 

Opposites 

Easy Direc 

Mixed Bel 
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IlandIV ItoIV 
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1 
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4 

8.5 

4 
10 
11 

4 
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8.6 



1 
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8.6 
4 

8.6 
5 
9 
10 
3 
6 
7 



ItoV 

2 

1 

7 

3 
10 

6 

8 
11 

4 

6 

9 



itovn 

3 

2 

6 

1 

7 
10.6 

6 

9 

4 
10.6 

8 



TABLE V 

Bank by Standards: ItoIV 

Visual Vocab 1.6 

Beading 1.5 

Composition 6 

Spelling 3 

Trabue B-J 8.6 

Trabue C-K 8.6 

Woody Mult 6 

Woody Div 7 

Opposites 4 

Easy Direc 9 

Mixed Bel 10 



ItoV 

3 
2 
6 
1 

9.6 
9.6 
6 
7 
4 
8 
11 



ItoVII 

8 

2 

6 

1 

8 
11 

4 

7 

6 
10 

9 
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The results, whether the grouping of standards be one to four, 
one to five, or one to seven, call attention to the fact that in a 
combination of tests for educational prognosis, the Arithmetic 
tests hold a relatively higher place than they do when considered 
as in Table T. 

In selecting a test, two other standards must be taken into 
account with those already considered: Economy of the pupil's 
time in taking the test, and economy of the administrator's time 
in scoring it. Standard 8 in Table S indicates the relative time 
consumed by the pupils in taking the tests, and standard 9 indi- 
cates, likewise, the relative time necessary to score the tests. In 
any practical experiment, these two standards must be consid- 
ered. For example, regardless of the importance of Composi- 
tion as a test, it is very diflScult to use it. The variability in 
grading even by skilled persons using an objective scale is so 
great that the same paper must be read by three or more persons 
and their scores averaged, in order to secure an approximately 
accurate grade. Next to Composition in time required both for 
taking the test and for scoring it, come Visual Vocabulary and 
Reading. However, in each of these cases the scoring requires 
the reading of only one person, and this score can be approxi- 
mately accurate. In speed of giving and ease of scoring, Op- 
posites. Mixed Belations, and Easy Directions are easily at the 
head of the list. 

11. Correlation of Combinations of Tests with Teachers' 
Ranking and with Composite of Eleven Tests 

For the administrator who, for any reason, does not wish to 
use all eleven tests considered in this study, it has been pointed 
out that Visual Vocabulary, Reading, Opposites, Spelling, Com- 
pletion Tests, Woody Multiplication are the tests he can use to 
greatest advantage. Table W indicates the success that would 
have been met with in this study if these tests had been used in 
the order mentioned. In reading this table it should be held in 
mind that the correlation of the composite of all eleven tests with 
the Teachers' Ranking was, for the 1916 tests, .66, and for the 
1917 tests, .68. The correlation of Visual Vocabulary with the 
composite of eleven tests in 1916 was .73, in 1917, .69 ; with the 
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composite of the Teachers' Banking 1916, .44, 1917, .43. The 
corresponding figures for Beading are .63, .67, .47, .47. The 
average correlation of Visual Vocabulary with the composite 
then is .71, and with Teachers' Banking, .43. Beading is some- 
what lower, with averages of .65 and .47. Therefore, the aver- 
age correlations of Visual Vocabulary and Beading are, .68 with 
the composite and .45 with the Teachers' Banking. However, 
when Visual Vocabulary and Beading are combined, as in Table 
W, the correlation is not that of the average of correlations, .68 
with the composite and .45 with Teachers' Banking, but is raised 
in 1916 to .77 and .54, i.e., the combination has raised the corre- 
lation about .10 in each case. The Completion Tests, which 
probably measure somewhat the same qualities as Beading and 
Visual Vocabulary, could have been used so far as their corre- 
lations with the composite are concerned. Thus J and K com- 
bined have a correlation of .76 with the 1917 composite. This 
is as high as that of Beading and Visual Vocabulary combined, 
and these tests can be given quicker and scored more easily than 
can Beading and Visual Vocabulary. However, instead of hav- 
ing a correlation of .54 with Teachers' Banking, as Beading and 
Visual Vocabulary have in 1916, the Completion Tests when 
combined have a correlation of .36 with Teachers' Banking. 
Since teacher-judgments must play so large a part in a practi- 
cal experiment, the reason for using Visual Vocabulary and 
Beading instead of the Completion Tests is apparent. The aver- 
age correlations of Beading, Visual Vocabulary, and Opposites 
with the composites of 1916 and of 1917 with the Teachers' 
Bankings are .65 and .44. But when these three tests are com- 
bined as one test the correlations are raised from .65 and .44 to 
.82 and .57 in 1916. If to the three tests just mentioned Spell- 
ing is added, the average correlation for all four tests for both 
years is .62 with the composite and .46 with Teachers' Banking; 
while the four tests combined as one test have correlations of 
.88 and .64 for 1916 and 1917 combined. These four tests then 
lack only .03 of having as high a correlation with Teachers' 
Banking as do the whole eleven tests, and, at the same time, 
they have a correlation with the composite of .87. On the basis 
of this experiment, this is the result that may be expected from 
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a little less than one and one quarter hours' testing. A further 
refinement, as is shown in Tables W and X, can be had by using 
the additional tests indicated. 

According to the classification by these four tests, Beading, 
Visual Vocabulary, Opposites, and Spelling, had the classes in 
1916 been composed of thirty pupils, there would have been, 
according to the Teachers* Rankings one year later, only eight 
displacements. That is, when all temporary illnesses on the 
part of the pupils, ranging from **bad colds" through con- 
tagious diseases to a month in the hospital, all fortunes or mis- 
fortunes in the home life, barring the withdrawal of the pupil 
from school, all the changing physical conditions and varying 
interests in boys of eleven to thirteen — ^when all these and a 
score of others that might be enumerated are considered, the 
use of these tests in one and one quarter hours' testiug at the 
beginning of the year would have agreed with the classification 
of the teachers after teaching the pupils one year in ninety per 
cent of all cases. 

12. Conclusion 

1. In this study, academic success in the first year of junior 
high school was more successfully predicted by a group of 
standardized tests than by all previous school marks or age or 
teachers' estimates. 

2. The tests in the order of their importance for the pur- 
poses of this study, when the administration and scoring of the 
tests are considered, have been found to be: Beading, Visual 
Vocabulary, Opposites, Spelling, Completion Tests, Arithmetic 
Tests, Easy Directions, Mixed Belations, and Composition. 
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TABLE Y 
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88 


100 


81 


100 


106 


82 

88 


84 

88 


121 
129 


84 


46 


97 


85 

86 

87 

88 

89 

40 

41 


90 

96 

88 

84 

74 

54 

92 


106 

96 
189 
117 
117 
118 

145 


42 


88 


125 


48 

44 

45 

46 

47 

48 

49 

50 

51 

52 


74 

80 

80 

88 

82 

90 

96 

92 

92 

90 


186 
148 
128 

142 

144 
117 
186 
111 

158 
121 


58 

54 

55 , 

56 

57 

58 


70 

94 

82 

88 

76 

96 


118 
129 
115 

128 
144 
118 


59 

60 


92 

92 


142 
129 


61 

62 

68 

64 

65 

66 


86 

90 

92 

72 

94 

88 


104 
138 
154 
108 
109 

120 


67 

69 

70 

71 

72 


66 

78 

98 

70 

66 

80 


112 

187 

140 

78 

87 
106 


73 


, . . . . 94 


188 


74 


.... 98 


118 
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TABLE AA 
Ranking by Tbachebs After Teaching Pupils 



I 








1 13.5 

2 38 

8..« 45.5 

4 13.5 

6 8.5 

6 2.5 

7 35.5 

8 28.5 

9 17 

10 45.5 

11 81 

12 2.5 

18 62 

14 63.6 

15 62 

16 69 

17 70 

18 72 

19 40 

20 7 

21 74 

22 62 

23 71 

24 40 

25 10.6 

26 49.5 

27 57 

28 69 

29 69 

80 28.6 

81 45.5 

32 13.5 

83 10.5 

84 40 

35 64.5 

86 59 

87 56 

88 13.5 

89 33 

40 69 

41 49.5 

42 20.6 

43 20.5 

44 40 

45 49.6 

46 59 

47 5 

48 5 

49 45.5 

50 55 

61 17 

52 40 

58 35.5 

54 24.5 

65 40 

66 40 

67 24.5 

68 8.5 

59 2.5 

60 68.5 

61 20.6 



4.6 
70 
65 
15.5 
55.5 
22.5 
65.5 
15.5 
34 
65.5 
84 

9 
70 
84 
84 
84 
70 
66 
65.5 

4.6 
65 
15.5 
22.5 
65.5 
15.5 
55.5 
55.6 
65.5 
48 

4.5 
55.6 

1 

9 
70 
55.5 
47 

4.5 

9 
34 
70 
34 
22.6 
65.5 
55.6 
70 
74 
43 
15.6 
43 
65.5 
15.5 
43 
48 
84 
55.5 
34 
22.5 
65.6 
27 
48 
84 



47 

46 

61 

29 

60 

56 

62 

38 

11 

25 

20.6 

26.5 

64 

14 

66 

29 

68 

71 

87.5 

5 
52.6 
89 

8 
48 

8.5 
48 
60 
68 
18 
12.5 
24 

8 
22.5 
72 
74 
45 
16 
85 
15 
73 
42 
70 
55 
81.5 
65 
40.6 
28 

8 
20.5 

8 
86 
26.9 
67 
52.5 
84 
54 
22.6 

1 
12.5 
67 
44 



o 



42 
58 
70 

4 
72 
40 
62 
26 
20 
29 
18 
88 
68 
11 
51 

8 
41 
62 
45 

8 
49 
44 

7 
48 

1 
87 
65 
67 
15 
18 
27 
12 
84 
66 
69 
55 
26 
47 
86 
74 
17 
88 
64 
30 
54 
24 
85 

6 
22 

9 
82 
31 
68 
43 
66 
89 
28 

2 
14 
67 
69 






48 
50 
51 
27 
78 
62 
64 
86 
16 
21 
81 
80 
67 
29 
63 
16 
66 
56 
64 

6 
67 
65 

8 
28 

2 
82 
58 
62 
28 
20 
88 

9 
87 
72 
74 
45 

8 
42 
22 
60 
18 
46 
68 

7 
61 
86 
26 
10 
34 
14 
12 
19 
65 
89 
59 
40 
88 

4 
25 
70 
41 



CO 



& 


o 


^ 


s 


•s 


•s 




i 



66 
57 
45 
28 
22 
70 
68 
11 
85 
64 

8 
18 
51 
44 
40 
26 
19 
68 
62 
12 
69 
18 
16 
81 

4 
27 
68 
64 
17 
28 

5 
30 
74 
47 
71 
68 

6 
46 

7 
66 
72 
88 
62 

2 
84 
41 
86 
14 
88 

9 
20 
24 
61 
37 
25 
21 
82 
49 

8 
48 
42 



66 

9 
46 
68 
46 
87 
63 
81 
12 
18 
19 
44 
48 

6 
65 

2 
86 
74 
89 

5 
85 
26 

1 
61 

8 
83 
55 
60 
20 
67 
28 
11 
26 
67 
61 
40 
10 
16 
41 
50 
78 
68 
62 
80 
88 
72 
24 

7 
54 
16 
56 
29 
48 
70 
34 
62 
14 

4 
17 
71 
28 
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TABLE AA— Continued 





Ranking by age, ^ 

^ youngest pupil r 

ranked 1, oldest q 

ranked 74 ^ 


llLA.GH£Bi 


S AFTEB '. 

lis 

31.5 

8.5 
58.5 
87.5 

8 
49 
69 
80 
51 
40.5 
58.5 

2 
19 


L1LA.GHIN< 

• 

1 

9 
b^ 

21 
28 
58 
46 
6 
61 
71 
16 
60 
50 
73 
10 
19 


} J:T7Pn.fl 

i 

•g 

24 

1 
44 
58 

5 
48 
69 
17 
71 
47 
49 
11 
18 


I 

CO 

• 

o 

• 

15 
1 
78 
55 
10 
29 
60 
50 
67 
89 
59 
48 
65 




1 

62 


45 

•»^ ^ ._ 

2 

9 
22.5 
27 
22.5 

9 
15.5 
27 
55.5 
70 
84 
48 
15.5 


• 

1 

82 


63 


49.5 


27 


64 


73 


42 


65 


24.5 


22 


66 


62 


8 


67 


24.5 


69 


68 


20.5 


64 


69 


17 


21 


70 


66 


58 


71 


64.5 


47 


72 


28.5 


59 


78 


28.6 


18 


74 


88 


49 



o.L 



370.6 .07261 no.99 C.I 
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